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ABSTRACT 


Data Mining is the process of drawing out the useful information 
from the raw data that is present in various forms. Data Mining is 
defined as study of the Knowledge Discovery in database process or 
KDD. Data mining techniques are relevant for drawing out the useful 
information from the huge amount of raw data that is present in 
various forms. In this research work different types of classification 
algorithms accuracies are calculated which are widely used to draw 
the significant amount of data from the huge amount of raw data. 
Comparative analysis of different Classification Algorithms have 
been done using various criteria’s like accuracy, execution time (in 
seconds) and how much instances are correctly classified or not 
classified correctly. 
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Data Mining is the process of exploring the patterns 
with the help of various techniques in the data 
gathered from the various sources [1]. Data Mining 
also involves selection of the relevant data from the 
database, preprocessing of the relevant data, 
transformation in the suitable form, data mining and 
evaluation of the data and afterwards online updating 
and visualization [1]. It is the analysis step of the 
“Knowledge Discovery” process. The actual task of 
the Data Mining is semi-self-regulating or self- 
regulating investigation of the large batches of the 
dataset for extracting the previously unknown, 
unusual records and dependencies [1]. Knowledge 
Discovery process includes various selection steps 
which helps in the efficient extraction of the useful 
data from the large datasets. These steps are sequential 
steps and they are repeated in iterative sequential 
manner until the useful information is not extracted. 
Data Mining is one of the essential steps in the KDD 
process [2]. 


Step 1: Selection Step: In the first step suitable data 
for the investigation task is fetched from the database 
[3]. On the basis of the extraction of suitable data 
objective dataset is formed [2]. 


Step 2: Pre-Processing Step: In the second step the 
data which is collected in the selection step is highly 
concerned with problems like vagueness, missing and 
irrelevant. data due to magnificent size and 


complexity. The above concerned problems are 
molded into a form which is suitable for the data 
mining techniques with the help of the different tools 
used for the data mining [2]. 


Selection Step 


Pre-Processing Step 
Transformation Step 


Data Mining Step 


(pattern extraction using 
Classification, Clustering algorithms) 


Interpretation 
Step/Evaluation Step 


Figure 1: Sequential Steps of KDD Process 
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Step 3: Transformation Step: In the third step data 
is molded into the form which is suitable for the 
classification by performing different operations like 
accumulation, induction, normalization, discretization 
and construction operations for the features [2] [3]. 
WEKA tool is used for the research work. 


Step 4: Data Mining: In the fourth step the Data 
Mining techniques (algorithms) are used for drawing 
out figures. Data Mining is used to analyze the dataset 
[2] [3]. In this work Data Mining Classification 
algorithms like J48, Random Tree, Naive Bayes, and 
Multilayer Perceptron are used for the investigation 
using WEKA Machine Learning Tool. 


Step 5: Interpretation/ Evaluation Step: In this step 
data patterns are identified on the basis of the some 
measures. To figure out and interpret the mining 
results correctly users need visualization approach to 
work with[2]. 


II. RELATED WORK 

K. Ahmed, T. Jesmin, 2014, this paper proposes to 
analyze accuracy of the data mining algorithms using 
three testing beds which are Percentage Split method, 
Training Data Set method and Cross Validation 
method. The classification is performed on type-2 
Diabetes disease dataset. According to this research 
paper the top 5 algorithms for classifying diabetes 
patients are Bagging (accuracy 85%), Logistic and 
Multiclass Classifier(accuracy 81.82%) [4]. 


C. Anuradha, T. Velmurugan, 2015, this paper 
comes up with the prediction of the future outcome of 
the final year results of UG student’s dataset. Cross 
fold validation and percentage split are the two testing 
beds used in the classification. According to the 
research Naive Bayes and Bayes Net performs well 
for the data set taken and K-NN, OneR performs 
poorly [5]. 


S. Gupta, N. Verma, 2014, proposes to analyze the 
classification algorithms on the basis of the Mean 
Absolute Error, Root Mean Squared Error and the 
Confusion Matrix. The performance evaluation is 
being done on the Naive Bayes classifier and 
according to the research the Mean Absolute Error 
and the Root Mean Squared Error is less in case of 
the training data set. According to the evaluated 
results Naive Bayes comes outto be the best suited 
algorithm [6]. 


R. Sharma et al, 2015, worked with various data 
mining algorithms to comparatively analyze those 
using criteria’s like definitiveness, execution time, 
different datasets and their applications. The 
algorithms which have been compared in the research 
are M5P algorithm, K Star algorithm, M5 Rule 
algorithm, Multilayer Perceptron algorithm. For the 


large dataset K-star comes out with the highest 
definitiveness. [7]. 


N. Orsu et al, 2013, stated about the different 
classification algorithms and their comparisons on 
micro-array of data that helps in predicting the 
occurrence of the tumor. Authors have compared 14 
different classification algorithms on the basis of the 
accuracy. According to the research work all 
classifiers comes out with the — significant 
performances in terms of accuracies [8]. 


S. Khare, S. Kashyap, 2015, provided analysis of the 
different classification algorithms which includes 
decision tree, bayesian network, k-nearest neighbor 
classifiers and artificial neural networks. A_ brief 
description of data mining and classification is given 
in the paper. Voting Dataset is used for analysis. 
According to the research work decision tree accuracy 
is better than the other algorithms [9]. 


Mad. N. Amin, Md. A. Habib, 2015, worked on the 
comparative analysis of J48 decision tree, multilayer 
perception, and naive bayes. According to the authors 
the research work shows the best algorithm is J48 with 
an accuracy of 97.61%, and the algorithm which is 
having lowest error rate with 27.91% is Naive Bayes 
[10]. 


S. Carl et al, 2016, worked on the comparative 
analysis of data mining algorithms which are k-means 
algorithms, k nearest neighbor algorithm, decision tree 
algorithm, naive bayes algorithm. From the research 
performed by the authors they have found that k 
means algorithm have less error rate and is the easier 
algorithmas compared to the KNN and Bayesian [11]. 


S. Vijayarani, M. Muthulakshmi, 2013, worked on 
the performance analysis of the bayesian and lazy 
algorithms. Various performance factors like ROC 
area, Kappa Statistics, TP Rate etc are used for the 
analysis. From the comparison it can be concluded 
that Lazy classifiers is efficient than the Bayesian 
classifiers [12]. 


S. Nikam, 2015, worked on the comparative analysis 
of classification algorithm like C4.5, ID3, k- nearest 
neighbor, Naive Bayes, SVM and ANN. Each 
algorithm has its limitations and features and based on 
theconditions we can choose the best suited algorithm 
for our dataset [13]. 


G. Raj et al, 2018, has shown comparative analysis of 
the classification algorithms using WEKA on 
hematological data of diabetic patients. The 
algorithms which have been studied are J48 decision 
tree, Zero R, Naive Bayes. From this comparison it 
can be concluded that Naive Bayes is the best 
algorithm on diabetic data with 76.3021% accuracy. 
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Naive Bayes classifier can be used to enhance the 
traditional classification methods which are used in 
the medical or bioinformatics areas [14]. 


N. Jagtap et al, 2017, provided a comprehensive 
analysis of different classification algorithms like 
Support Vector Machines, Bayesian Networks, 
Genetic Algorithms, Fuzzy Logic etc. The 
comparative study of the algorithms is done on the 
basis of the advantages and disadvantages of the 
algorithms [15]. 


N. Nithya et al, 2014, stated about the Logistics, 
Simple Logistics, SMO algorithms which are 
compared on the basis of the accuracy measurement, 
TP Rate, FP Rate, Precision, Kappa Statistics etc. 
According to the analysis Logistics method suits best 
from the Function Classifier Algorithm, but according 
to the time accuracy SMO produces the best result 
[16]. 


S. Chiranjibi, 2015, worked on the comparative 
analysis of Naive Bayes, Bayes Network, Logistics, 
Decision tree, Multilayer Perception, REPTree, 
ZeroR, Ada Boost. From the work it can be concluded 
that logistic algorithm is best which works well for the 
higher no of attributes and higher no of instances [17]. 


C. Fernandes et al, 2017, describes about the 
different decision tree classifiers and the decision tree 
classifiers are used to forecast student’s proficiency. 
CHAID has highest accuracy rate that is 
76.11 %followed by C4.5 by 73.13% [18]. 


S. Srivastava et al, 2013, worked on the performance 
of classification algorithms and results are compared 
and evaluation is done on the already existing 
datasets. Accuracy of the SPRINT algorithm is more 
and the performance is satisfactorily good [19]. 


A. Lohani et al, 2016, worked on the comparative 
analysis of the algorithms and the result of the 
analysis is shown using ROC (Receiver Operating 
System) graphically. This paper shows that if 
ensemble methods are used than better results can be 
seen. C4.5 algorithm is not stable [20]. 


S. Devi, M. Sunadaram, 2016, stated about the data 
mining and the various research domains, about meta 
and tree classifiers. This paper provides analysis 
between meta and tree classifiers and as a result of the 
analysis it is shown that meta classifier is more 
efficient than tree classifier [21]. 


S. Priya, M. Venila, 2017, stated about the cancer 
diagnosis which is a field of healthcare and the 
diagnosis of the disease is done with the help of the 
data mining classification algorithms on the basis of 
the correctly and incorrectly classified instances [22]. 


K. Danjuma, A. Osofisan, 2014, stated about various 
classification algorithms and they have been 
comparatively analyzed using cross-fold validation 
method and sets of performance metrics. The analysis 
shows that 97.4% accuracy was of Naive Bayes, 
Multilayer Perceptron having 96.6% and J48 comes 
with much less accuracy that is 93.5% [23]. 


N. Kaur, N. Dokania, 2018, worked on the 
comparative analysis of k-mean and y-mean done on 
the basis of the features like efficiency, number of 
clusters an item belongs, performance, shape of 
cluster, detectionrate etc.[24]. 


E. Sondakh, R. Pungus, 2017, worked on the 
comparative analysis of three classification algorithms 
tocompose the best suited algorithm for model. Three 
algorithms resulting models shows no significant 
difference between performance of Naive Bayes and 
Decision Tree while SVM shows lowest performance 
[25]. 


K. Kishore, M. Reddy, 2017, stated about data mining 
and its different techniques. Two things have been 
explained one the comparison between different 
datasets using one algorithm and second comparison 
of different algorithms using single dataset [26]. 


I. RESEARCH METHODOLOGY 

In data mining classification of large data set is a 
problem. Data mining has various techniques like 
classification, regression, clustering etc. This paper 
mainly focuses on the classification techniques having 
various algorithms which will help in classifying the 
records. The datasets contains instances or the classes 
and the attributes which helps in classifying the 
records. Random Tree, J48 Decision Tree, Multilayer 
Perceptron and Naive Bayes are the algorithms used 
for the analysis of the classification techniques. 


The research work mainly focuses on the comparative 
analysis of the classification algorithms which are 
Naive Bayes, Multilayer Perceptron, Random Tree 
and J48 on Chronic Kidney Disease dataset. The 
results of comparative analysis are anatomized to 
deduce best suited algorithm on the basis of 
definitiveness, execution time, correctly classified 
instances and incorrectly classified instances. 


A. DATASET USED: In this research work we 
have used Chronic Kidney Disease (CKD) 
dataset. The main focus of this reasearch is 
performance and evaluation of Naive Bayes, 
Multilayer Perceptron, J48, Random _ Tree 
algorithms. This dataset contains 400 instances 
and 25 attributes. For analyzing the performance 
of the classification algorithms WEKA data 
mining tool is used. 
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Chronic Kidney Disease is a type of disease in which kidney losses its function over a period of month or year. 
Clinical Diagnosis of the Chronic Kidney Disease is done with the help of urine and the samples of the blood as 
well diagnosing the sample of the kidney tissue. Early diagnosis and detection of the disease is very important so 
that failure of the kidney can be stopped. For predicting chronic kidney disease data mining and analytics 
techniques are used and historical patient’s data and diagnosis records are used. Using the CKD dataset 
comparative analysis of the algorithms is done on the basis of parameters accuracy, properly graded instances, 


improperly graded instances, 


error rate and execution time [28]. 


Relevant Information: 


age - ae 
bp - blood pressure 
s - 2ibunin, gravity 
a - : bumin 
su - 
rbc - iided cells 
pc - in cell 
cc - er cell clumps 
a - acteria 
bgr - blood glucose random 
bu - blood urea 
sc - serum creatinine 
sod - sodium 
ot - otassium 
emo - emoglobin 
pcv - packed cell volume 
we - white blood cell count 
rc - red blood cell count 
htn - hypertension 
dm - diabetes mellitus 
cad - coronary artery disease 
appet - appetite 
pe - pedal edema 
ane - anemia 
class class 


Figure 2: Abbreviations used in dataset 


Number of Instances: 400 (250 CKD, 150 notckd) 
Number of Attributes: 24 + class = 25 (€ 11 numeric ,14 nominal) 
Attribute Information : 

1. Age(numerical) 


age in years 


-Blood Pressure(numerical) 


bp in mm/Hg 


.-Specific Gravity(nominal) 


sg - (1.005,1.010,1.015 ,1.020,1.025) 


-Albumin(nominal) 


al - (0,1,2,3,4,5) 


. Sugar —— nal) 


u - (0,1,2,3,4,5) 


Red Wiesd Cells(nominal) 


rbc - (normal, abnormal) 


-Pus Cell (nominal) 


pc - (normal, abnormal) 


-Pus Cell clumps (nomi nal) 


cc - (present,notpresent) 


p 
.Bacteri — nal) 


- (present,notpresent) 
- Blood Gl ucose Random(numeri cal) 
bgr in mgs/dl 
-Blood Urea(numerical) 
bu in mgs/d1 
- Serum Creatini or ames cal) 
sc in mgs/dl 
- Sodi um(numerical) 
sod in mEq/L 
- Potassium(numerical) 
pot in mEq/L 
. Hemoglobin(numericat)| 
hemo in gms 
»~Packed Cell Volume(numerical) 
«White Blood Cell Count(numerical) 
we in cells/cumm 
-Red Blood Cell Count (numerical) 
re in mllions/cmm 
. Hypertension(nomi = 
htn - (yes,n 
.Diabetes Mellitus ae nal) 
- Cyes,no) 
-Coronary Artery Disease(nominal) 
cad - (yes,no) 
-Appetite(nominal) 
appet - (good,poor) 
-Pedal Edema(nominal) 
pe - (yes,no) 
-Anemia(nomi nal) 
ane - (yes,no) 
-Class (nominal) 
class - (ckd,notckd) 


Figure 3: Instances and Attributes in Dataset 
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B. CLASSIFICATION: Classification is a data mining technique and is a supervised learning having broad 
applications. Classification technique classifies each item of a set into a predefined set of classes or groups. 
Among all the techniques in the data mining the apex technique 1s classification. Dataset is being inspected 
by classification and each instance of the dataset is considered. The instances which are inspected and 
considered by the technique are appointed to appropriate class such that there will be least error in the model 
[29]. 


Models defining the influential data classes inlying in a particular dataset are withdrawn using classification 
technique. The two states of the classification includes application of the algorithm to construct the model and 
afterwards constructed model is tested contrary to a already defined dataset to measure the performance and 
definitiveness(accuracy) of the model. In this research work we have analyzed Naive Bayes, Random Tree, J48 
and Multilayer Perceptron algorithms on Chronic Kidney Disease dataset. Above algorithms are briefly described 
below: 


NAIVE BAYES: Naive Bayes is one of the classifier algorithms in data mining under the bayes class or it can be 
said that it is an enhanced form of bayes theorem. The possible result is calculated according to the input in 
Bayesian classifier. Those features of class are considered by the naive bayes which are not related to any other 
feature of the class [29]. Working of naive bayes algorithm is described as follows: 


> P(d|b) > Posterior probability of class (target) given predictor (attribute) of class. 
> P(d)> Prior probability of class. 
p(b|d) x p(d) 


p(b) 
p(b|d) = p(b1|d) « p(b2|d) * p(b3|d) *... ... ...... * p(bn|d) * p(d) 
Figure 4: Naive Bayes Theorem [30] 


p(d|b) = 


> P (b|d) > likelihood which is the probability of predictor of given class. 
> P(b) > Prior probability of predictor of class. 


J48: J48 classifier is the enhanced version of the C4.5 classifier. Decision tree is produced as a resultby the J48. 
Decision tree produces a tree like structure which has different nodes in it. These different nodes in the tree 
contain some judgment and each judgment leads to the particular outcome known as decision tree [10]. Simple 
algorithm is being followed by the J48 which works as follows: 


New items are being classified by constructing a decision tree which uses available training datasets values after 
that those attributes are identified who segregates the distinct instances most clearly [30]. Due to this highest 
information from the data instances can be gained [30]. Dataset is partitioned into commonly restricted areas 
where each area has its own tag, values and associated actions to describe its data points. This partitioning helps 
in deciding which portion of the tree is reaching to a particular resulting node [10]. 


MULTILAYER PERCEPTRON: Linearly separable problems can be classified by the single layer perceptron. 
We use more than one or multiple layers for the non-separable problems. For this we use multilayer network. The 
Multilayer (feed forward) network has multiple layers including multiple hidden layers containing neurons and 
these neurons are hidden neurons. By using the past data input is correctly mapped into the output when desired 
output is not known. With each input the output of the neural network is compared with the desired output so as 
to compute the error [10]. For computing the error output produces by the neural network is compared with the 
desirable output [10]. 


@ IJTSRD | Unique Paper ID —-IJTSRD50568 | Volume-—6 | Issue—5 | July-August 2022 Page 862 


International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 


Figure of the multilayer network is shown below: 


Input values 
Input layer 
Weight matnx 1 
Hidden layer 
Weight matnx 2 
Output layer 
Output values 


Figure 5: Multilayer Perceptron 


RANDOM TREE: Random Tree is a type of supervised learning algorithm. This learning algorithm produces 
various trainees. Random Trees have been introduced by the Leio Brieman and Adele Cutler. Randomtree is a 
group of tree predictors which is known as forest. The random tree algorithm is as follows: random treeclassifier 
get its input feature vector, this input vector is compared with each tree in the forest and gives the name of the 
class as an output with which this input vector matches having majority of votes. 2 machine algorithms are 
combined to form the random forest. Random forest ideas are combined with single modeled trees. 


TOOL USED: WEKA known as Waikato Environment for Knowledge Analysis which is constructed in New 
Zealand in the University of Waikato. This machine learning software is written in Java. WEKA is acollection of 
visualization tools and algorithms for the predictive modeling [27]. Different types of data mining algorithms can 
be tested using different type of datasets. The techniques which are supported by the WEKA are Data Processing, 
Classification, Clustering, Visualization Regression and Feature Selection [21]. There are 5 interfaces in the tool 
and main user interface is explorer with which we work but all other interfaces provides same functionality just as 
the explorer [27]. 


IV. EXPERIMENTAL RESULTS 

This research work analyses different classification algorithms accomplishment for Chronic Kidney Disease 
dataset. Comparison of classifiers for Chronic Kidney Disease dataset is done using criteria accuracy, correctly 
classified instances, incorrectly classified instances, error rate and execution time to analyse the performance of 
the classification algorithms and its application domain is also discussed. Models for each algorithm are 
constructed using two methods maily — Cross Validation with 10 folds out of which training set uses 9 folds and 1 
fold for testing and Percentage Split in which 60% of the dataset is used for the training and 40% is used for the 
testing and output is given according to it. 


Figures are shown for the comaprison of the different classifiers for CKD dataset using 10 fold cross validation 
testing bed. Applications are also discussed of these classifiers in the table. According to the table andresearch 
the execution time taken by the Random Tree algorithm is least with 0.02 seconds followed by Naive Bayes with 
0.02 seconds, J48 algorithm with 0.1 seconds and multilayer perceptron took much more time for execution 
which is 8.97 seconds. Accuracy of Multilayer perceptron is 99.75%, J48 with 99%, Random treewith 95.5% 
and naive Bayes with 95%. The accuarcies of the algorithms don’t have much difference in between. Hence 
according to the data Multilayer perceptron algorithm is most accurate in case of 10 fold cross validation 
method. 
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MULTILAYER PERCEPTRON RANDOM TREE 
Figure 6: Result evaluation for different classification algorithm on CKD dataset 


For Chronic Kidney Disease 


Classifier Naive Bayes | Multilayer Perceptron Random Tree 
Testing Bed Cross Validation Cross Validation Cross Validation as 
validation 
Text classification, ee Machine learning, Emotion 
ge Speech recognition, : : a 
Spam filtering, it Genetic algorithm, | recognition, 
—— : ae: Imagerecognition, : . 
Applications Online Application, : : Fault diagnosis, Verbal 
: Machine translation , 
Hybrid sotiware 1321 Rotating column 
recommendersystem ; Machinery [33]. pathologies. 
Execution Time 0.03 seconds 8.97 seconds 0.02 seconds 0.1 seconds 
Accuracy 95% 99.75% 95.5% 99% 


Table 1: Comparison of classifiers for CKD dataset using cross validation testing bed 


100% 
m Accuracy 
50% 
@ execution 
% rT time 
NB MP RT J48 


Figure 7: Graphical representation of different algorithms accuracy and execution time using cross 
validation method. 
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In the graph the abbreviation NB stands for Naive Bayes, MP for Multilayer Perceptron, RT for Random Tree. 
The number of correctly classified instances in Naive Bayes is 380, Multilayer perceptron with 399, Random 
tree with 382 and J48 with 396. The incorrectly classified instances by Naive Bayes is 20, Multilayer perceptron 
with 1, Random tree with 18 and J48 with 4. Now analysis for CKD using percentage split method is done and 
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Tite taken to Gest Bodel of test split: O seconds 


== SAT Yy === 


Correctiy Classified Instances 1s2 5 + 
Incorrectly Classified Instances E 5 $ 
Kappa statistic C.8343 
Mean absolure error §.0493 
Root mean squared errer o.2045 
Relative abeacluce error 16.5446 4 
Boot relative: squared error 42,5665 4 
Totel Humber of Instences ré0 
wen [etailed Rocuracy By Class == 
TP Rate FP Rate Frecision Beca F-Measure 8CC ROC Area ERO Area (Clasa 
8.522 0.000 1.000 G.522 o. 560 0.254 1. G0 1.600 ckd 
2.000 O.OTE G.87T 1.000 0.534 0.899 1.0m 1.060 noted 
Weighted Any. 0350 0.025 G.356 0.950 0.852 O.o88 1.000 2.000 
WAIVE BAYES 
Time taken to test model on test eplit: 0.01 seconds 
=== Tary == 
Correctly Classified Instances 160 if § 
Incorrectly Classified Instances a i] 7 
Happs staristic 1 
Mean absolute e€rror O.0218 
Boot Bean squared errer 0.0856 
Reletive absolute error 4.6561 € 
Boot relative squared error 17,8342 4 
Total Wosber of Instances 16 
wen [erailed Accuracy By Class == 
TE Rate FE Rate Frecision Recall fF-Meagure IEC POU Ares FRO Area (Clagz 
1.008 0.000 4.000 1.00 1.9 t.o8) 10M 1. cH 
1.060 WO 1.006 1.00 1.0 1.60 L.Ed 1.000 noticed 
Weighted Avg. 1.000 6.000 1.006 1.000 1.000 1,0 1.08 1.000 
J48 Decision Tree 
For Chronic Kidney Disease 
Classifier | Naive Bayes | MultilayerPerceptron RandomTree | J48 
Testing Bed Percentage Split Percentage Split PercentageSplit | PercentageSplit 
ExecutionTime 0 seconds 0 seconds 0 seconds 0.01 seconds 
Accuracy 95% 98.125% 96.25% 100% 


Tale 2: Comparison of classifiers for CKD dataset using pecrentage split method 


According to this test method that is percentage split it can be concluded that Naive Bayes, Random Tree and 
Multilayer Perceptron took 0 sceonds for execution while J48 took 0.01 seconds for execution. Accuracy of the 
J48 algorithm comes out to be 100% while that of Multilayer Perceptron with 98.125%, Naive Bayes with 95% 
accurate and random Tree with 96.25% accuarte. The number of correctly classified instancesin Naive Bayes is 
152, Multilayer Perceptron with 157, Random Tree with 154 and J48 with 160. Number of incorrectly classified 
instances in Naive Bayes is 8, Multilayer Perceptron with 3, Random Tree with 6 and J48 with 0. 


@ IJTSRD | Unique Paper ID- UTSRD50568 | Volume-—6 | Issue—5 | July-August 2022 Page 866 


International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470 


@ Accuracy 


@ Execution 
time 


rs 


a 


NB MP RT J48 


Figure 8: Graphical representation of different algorithms accuracy and execution time in percentage 
split 


Graphical representation of different algorithms accuracy in percentage split method. The abbreviationsin the 
chart stands for Naive BAyes, Multilayer Perceptron, Random Tree. 


Graphical representation of correctly and incorrectly classified instnces by the classifiers are: 


@ correctly 
classified 
instances 


@ incorrectly 
classified 
instances 


™@ correctly 
classified 
instances 


@ incorrectly 
classified 
instances 


Figure 10: correctly and incorrectly classified instances in case of Cross Validation 


From the graphs it is analyzed that there is no such difference between the perfromance of the classification 
algorithms they have significant performances for the chronic kidney disease dataset but on th basis of graph 
analysis Multilayer Perceptron classifier is most accurate when using cross validation method andJ48 classifier is 


most accurate when using percentage split. 


V. CONCLUSION 

Comparision and investigation of the accomplishment 
of various classification algorithms is done using 
different criteria which are accuracy, execution time, 
correctly classified instances, incorrectly classified 
instances and error rate. According to the result 
evaluation it can be concluded that Multilayer 
Perceptron is most accurate with 99.75% when 10 
folds cross validation method is applied for CKD 
dataset and for Percentage Split method J48 algorithm 
is most accurate with 100% accuracy. From the figure 


7 and 8 it can be analyzed that all the algorithms don’t 
have much significant difference in between their 
accuracies. Hence type and size of the datasets are the 
factors on which algorithms performance depends. 
The further result evaluation study can be done for the 
performance of other classification techniques with 
large dataset sample. Clustering, association, 
sequential patterns etc techniques can be used to draw 
more efficient results apart from the classification 
technique 
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VI. 


FUTURE WORK 


In future focus will be on how to improve the 


classifiers 


performance so that classification 


techniques requiresless time to execute. For enhancing 
the performance different classification algorithms can 
be used together. 
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